Race batch PDLP and dual simplex in strong branching / reliability branching#994
Conversation
…+ fail safe if batch pdlp fails
…te_lp_problem' into race_strong_branching_realibility_branching
…ng, correctly fill the ds_obj objective before merging results at the root, correctly clamp the PDLP objective, remove the unucessary cuopt_assert regarding fixed point error
|
/ok to test 0d5ae1c |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
/ok to test 7504340 |
|
/ok to test a7f79fa |
|
/ok to test dd07576 |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
cpp/src/branch_and_bound/pseudo_costs.cpp (1)
547-584:⚠️ Potential issue | 🟠 MajorAssert or remap the slack-column suffix invariant.
remove_columns(cols_to_remove)treatsnew_slacksas arbitrary column indices, but Lines 560-584 still feed the firstnentries ofroot_soln,objective,lower, andupperinto PDLP. That only stays aligned if every slack index already lives in[n, lp.num_cols). Otherwise the compacted matrix, the vectors, and the branching indices no longer describe the same columns.Minimal guard if appended slacks are the intended invariant
cuopt::mps_parser::mps_data_model_t<i_t, f_t> mps_model; int m = lp.num_rows; int n = lp.num_cols - new_slacks.size(); original_root_soln_x.resize(n); + for (i_t slack : new_slacks) { + assert(slack >= n && + "simplex_problem_to_mps_data_model assumes slack columns are appended"); + }If that invariant is not guaranteed, this helper needs a real old→new column map and the callers need to remap the candidate indices as well.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 547 - 584, The code removes columns via A_no_slacks.remove_columns(cols_to_remove) but still blindly slices the first n entries of root_soln, lp.objective, lp.lower, lp.upper into PDLP (original_root_soln_x and mps_model.set_* calls), which is only correct if all slack indices in new_slacks are appended (>= n); add an explicit invariant check or build a proper old→new column map: either assert all new_slacks[j] >= n before proceeding (and document the invariant), or construct a mapping from original column index → compacted index after remove_columns and use it to remap original_root_soln_x, the objective vector, lower/upper bound arrays, and any branching candidate indices so the CSR matrix and these vectors stay aligned (refer to cols_to_remove, A_no_slacks, original_root_soln_x, mps_model.set_objective_coefficients, set_variable_lower_bounds, set_variable_upper_bounds).cpp/src/pdlp/pdlp.cu (1)
801-864:⚠️ Potential issue | 🟠 MajorMark the last resized-batch climbers as solved before this return.
Unlike Line 865, this early-return branch never calls
sb_view_.mark_solved(...)for the climbers still left in the batch. Any Dual Simplex task that has not started those branches yet will keep solving them even though PDLP already finished them, so the final chunk cannot be stolen/skipped.Suggested fix
if (original_batch_size != climber_strategies_.size()) { `#ifdef` BATCH_VERBOSE_MODE std::cout << "Original batch size was " << original_batch_size << " but is now " << climber_strategies_.size() << std::endl; `#endif` @@ if (current_termination_strategy_.get_termination_status(i) != pdlp_termination_status_t::ConcurrentLimit) { batch_solution_to_return_ .get_additional_termination_informations()[climber_strategies_[i].original_index] .solved_by = method_t::PDLP; } } + if (sb_view_.is_valid()) { + for (size_t i = 0; i < climber_strategies_.size(); ++i) { + sb_view_.mark_solved(climber_strategies_[i].original_index); + } + } current_termination_strategy_.fill_gpu_terms_stats(total_pdlp_iterations_); RAFT_CUDA_TRY(cudaStreamSynchronize(stream_view_)); current_termination_strategy_.convert_gpu_terms_stats_to_host( batch_solution_to_return_.get_additional_termination_informations()); return optimization_problem_solution_t<i_t, f_t>{🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/pdlp.cu` around lines 801 - 864, This early-return path fails to notify the solver-bookkeeping that the remaining resized-batch climbers are finished; before returning, iterate over current_termination_strategy_.get_terminations_status().size() and call sb_view_.mark_solved(...) for each climber using climber_strategies_[i].original_index (and the appropriate termination/solution metadata you have just copied into batch_solution_to_return_), mirroring the behavior used at the later return (line 865) so in-progress Dual Simplex tasks can stop/skip those chunks.
♻️ Duplicate comments (6)
cpp/src/branch_and_bound/pseudo_costs.hpp (1)
416-425:⚠️ Potential issue | 🟠 MajorInvalidate the PDLP warm-start cache when the LP context changes.
Once
populatedflips true, this struct has no reset path, so reusing the samepseudo_costs_tafter cuts/slacks/original-LP changes can replay stale warm-start vectors and scalars into a different batch LP. Add areset()helper here and call it from the context-reset path.Suggested fix
struct batch_pdlp_warm_cache_t { const raft::handle_t batch_pdlp_handle{}; rmm::device_uvector<f_t> initial_primal{0, batch_pdlp_handle.get_stream()}; rmm::device_uvector<f_t> initial_dual{0, batch_pdlp_handle.get_stream()}; f_t step_size{std::numeric_limits<f_t>::signaling_NaN()}; f_t primal_weight{std::numeric_limits<f_t>::signaling_NaN()}; i_t pdlp_iteration{-1}; f_t percent_solved_by_batch_pdlp_at_root{f_t(0.0)}; bool populated{false}; + + void reset() + { + initial_primal.resize(0, batch_pdlp_handle.get_stream()); + initial_dual.resize(0, batch_pdlp_handle.get_stream()); + step_size = std::numeric_limits<f_t>::signaling_NaN(); + primal_weight = std::numeric_limits<f_t>::signaling_NaN(); + pdlp_iteration = -1; + percent_solved_by_batch_pdlp_at_root = f_t(0.0); + populated = false; + } };As per coding guidelines "Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/pseudo_costs.hpp` around lines 416 - 425, The batch_pdlp_warm_cache_t struct can retain stale warm-start data after the LP context changes because there is no reset path; add a member method reset() on batch_pdlp_warm_cache_t that clears initial_primal and initial_dual (release/resize to 0 on batch_pdlp_handle.get_stream()), sets step_size and primal_weight back to NaN, pdlp_iteration to -1, percent_solved_by_batch_pdlp_at_root to f_t(0.0) and populated to false, and then call this reset() from the pseudo_costs_t code path that reinitializes/changes the LP context (where the context-reset currently occurs) so any time cuts/slacks/original-LP are changed the warm cache is invalidated.cpp/src/pdlp/solve.cu (2)
1049-1057:⚠️ Potential issue | 🟠 MajorValidate
shared_sb_solvedbefore taking subspans.The only guard here is
!empty(). If the caller passes fewer solved flags thannew_bounds.size(),subspan(i, current_batch_size)walks past the end of the coordination array and corrupts cooperative termination state. Check the span length once before the loop.Suggested fix
+ cuopt_assert(settings.shared_sb_solved.empty() || + settings.shared_sb_solved.size() >= max_batch_size, + "shared_sb_solved must cover every batch entry"); for (size_t i = 0; i < max_batch_size; i += optimal_batch_size) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/solve.cu` around lines 1049 - 1057, The code only checks settings.shared_sb_solved.empty() but not its length relative to new_bounds, so subspan(i, current_batch_size) can out-of-bounds; before the batching loop validate that when settings.shared_sb_solved is non-empty its size() (or extent()) is >= max_batch_size (or original_new_bounds.size()), and if not, either trim max_batch_size/current_batch_size accordingly or return/error; then inside the loop you can safely call settings.shared_sb_solved.subspan(i, current_batch_size) to populate batch_settings.shared_sb_solved. Ensure you reference settings.shared_sb_solved, original_new_bounds/new_bounds, max_batch_size and subspan in the check and handle the shorter span case consistently.
968-972:⚠️ Potential issue | 🟠 MajorDon't let
sub_batch_sizebypass the memory ceiling.
memory_max_batch_sizeis computed specifically to avoid OOM, but a positivesettings.sub_batch_sizeoverwrites it unchecked. A caller can request a sub-batch larger than the memory-checked limit and reintroduce the allocation failure this code just avoided.Suggested fix
size_t optimal_batch_size = use_optimal_batch_size ? detail::optimal_batch_size_handler(problem, memory_max_batch_size) : max_batch_size; - if (settings.sub_batch_size > 0) { optimal_batch_size = settings.sub_batch_size; } + if (settings.sub_batch_size > 0) { + const auto requested_sub_batch_size = static_cast<size_t>(settings.sub_batch_size); + cuopt_assert(requested_sub_batch_size <= memory_max_batch_size, + "sub_batch_size exceeds the memory-checked batch size"); + optimal_batch_size = requested_sub_batch_size; + }As per coding guidelines "Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/solve.cu` around lines 968 - 972, The current logic lets settings.sub_batch_size override the memory-checked limit, which can reintroduce OOMs; instead, when computing optimal_batch_size in solve.cu (symbols: optimal_batch_size, settings.sub_batch_size, memory_max_batch_size, detail::optimal_batch_size_handler, max_batch_size, cuopt_assert), clamp settings.sub_batch_size to the memory ceiling (e.g., use min(settings.sub_batch_size, memory_max_batch_size) or validate and reject values > memory_max_batch_size) so the final optimal_batch_size never exceeds memory_max_batch_size (and still respects max_batch_size), then keep the cuopt_assert as a final sanity check.cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp (1)
303-305:⚠️ Potential issue | 🔴 CriticalKeep batch solution collection opt-out, not opt-in.
With this default at
false, generic batch LP solves now return empty primal/dual/reduced-cost buffers unless every caller remembers to opt in. Existing batch paths incpp/tests/linear_programming/pdlp_test.custill slice those buffers unconditionally, so this regresses the non-MIP batch API.Suggested fix
- // By default to save memory and speed we don't store and copy each climber's primal and dual - // solutions We only retrieve termination statistics and the objective values - bool generate_batch_primal_dual_solution{false}; + // Preserve the historical batch behavior and collect per-climber solutions unless callers opt out. + bool generate_batch_primal_dual_solution{true};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp` around lines 303 - 305, Defaulting generate_batch_primal_dual_solution to false makes batch solves opt‑in and breaks callers that expect populated primal/dual/reduced-cost buffers; change the bool generate_batch_primal_dual_solution default to true (so batch solution collection is enabled by default/opt‑out) and update the adjacent comment to reflect "opt-out" semantics so existing tests and code that slice these buffers continue to work without modification.cpp/src/branch_and_bound/shared_strong_branching_context.hpp (1)
38-56:⚠️ Potential issue | 🟠 MajorMake the view helpers release-safe and overflow-safe.
These accessors still rely on
assert, so a bad index becomes unchecked UB in release builds.subview()also computesoffset + countini_tbefore the cast, which can overflow; validate the casted sizes separately and use those converted indices foroperator[]/subspan.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/shared_strong_branching_context.hpp` around lines 38 - 56, The accessors (is_solved, mark_solved, subview) use asserts and can overflow in release builds; replace the assert-based checks with runtime bounds validation and throw std::out_of_range on failure, and avoid signed arithmetic overflow in subview by converting indices to size_t first. Concretely: in is_solved(i_t local_idx) and mark_solved(i_t local_idx) cast local_idx to size_t (e.g. size_t idx = static_cast<size_t>(local_idx)), verify local_idx >= 0 and idx < solved.size(), and throw std::out_of_range with a clear message instead of asserting; in subview(i_t offset, i_t count) cast both to size_t (size_t off = static_cast<size_t>(offset), cnt = static_cast<size_t>(count)), ensure offset and count are non-negative, check off <= solved.size() and cnt <= solved.size() - off, then return {solved.subspan(off, cnt)} to avoid offset+count overflow.cpp/tests/linear_programming/pdlp_test.cu (1)
2243-2254:⚠️ Potential issue | 🟠 MajorReplace the fixed 200 ms sleep with deterministic synchronization.
On
afiro-sized batches PDLP can finish before the delay. That makes these tests flaky and can skip theConcurrentLimitpath entirely. Use a latch/promise/barrier that the solver thread signals once the cooperative path is live, then apply the external marks.Also applies to: 2309-2320
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/linear_programming/pdlp_test.cu` around lines 2243 - 2254, The fixed sleep is flaky; replace it with a deterministic synchronization (e.g., std::promise/std::future or std::latch) so the main thread waits until the solver thread actually enters the cooperative DS path. Create a promise/future (or latch), pass the signal into the pdlp_thread lambda (or have solve_lp expose a hook) and have the solver thread set the promise when the cooperative path is live; then have the main thread wait on the future before calling sb_view.mark_solved(i). Apply the same change to the other occurrence around the second pdlp_thread usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 1044-1054: When simplex_iteration_limit < 1 the code calls
initialize_pseudo_costs_with_estimate(...) which populates pc.strong_branch_down
and pc.strong_branch_up, but the function continues and later merges/overwrites
those estimates with UNSET/NaN values; to fix, add an early return (or
conditional skip of the merge/overwrite block) immediately after calling
initialize_pseudo_costs_with_estimate so the function exits with the
estimate-only pseudo-costs intact and does not proceed to the later
merge/overwrite logic before update_pseudo_costs_from_strong_branching().
- Around line 1587-1595: The current gate "if (rb_mode != 2)" skips the whole
taskloop when rb_mode==2 even if use_pdlp is false, so unreliable_list
candidates never get evaluated; change the condition to allow Dual Simplex
fallback when PDLP is disabled by using "if (rb_mode != 2 || !use_pdlp)" (or
logically equivalent) around the pragma/taskloop that references score_mutex,
sb_view, dual_simplex_obj_down/up, dual_simplex_status_down/up and
unreliable_list, and update any related log text so it correctly reflects that
DS will run when PDLP is unavailable.
In `@cpp/src/branch_and_bound/pseudo_costs.hpp`:
- Line 541: The pdlp_warm_cache member on pseudo_costs_t is mutable shared state
and can race when multiple workers perform reliability/strong branching; make
the cache worker-local or add synchronization: either move the
batch_pdlp_warm_cache_t<i_t, f_t> pdlp_warm_cache out of the shared
pseudo_costs_t (create a stack/local instance inside the branching worker
routine) or add a dedicated omp_mutex_t (e.g., pdlp_warm_cache_mutex) and wrap
every access/update to pdlp_warm_cache in a lock/unlock pair (similar to
existing omp_atomic_t/omp_mutex_t usage) to prevent cross-contamination of
device buffers, scalars, and populated state. Ensure all functions that read or
write pdlp_warm_cache (reliability/strong branching paths) use the new mutex or
the local per-worker instance.
In `@cpp/src/dual_simplex/simplex_solver_settings.hpp`:
- Around line 194-197: The member mip_batch_pdlp_reliability_branching is not
initialized in the class constructor, leaving it with indeterminate value;
update the constructor initializer list (next to
mip_batch_pdlp_strong_branching(0)) to include
mip_batch_pdlp_reliability_branching(0) so the default-constructed
SimplexSolverSettings/related class always sets this field to a defined value.
In `@cpp/src/mip_heuristics/CMakeLists.txt`:
- Line 17: Remove clique_table.cu from the MIP_LP_NECESSARY_FILES list and add
it to the MIP_NON_LP_FILES list in the CMakeLists.txt so that clique_table.cu
(which includes mip_heuristics/utils.cuh) is built only as MIP-non-LP code;
specifically, edit the lines that construct the MIP_LP_NECESSARY_FILES and
MIP_NON_LP_FILES variables to stop listing clique_table.cu in the former and
include it in the latter, and verify any targets or install/grouping logic that
consumes those variables still references the file from the new list.
In `@cpp/src/pdlp/pdlp.cuh`:
- Around line 142-143: The initialization of sb_view_
(dual_simplex::shared_strong_branching_context_view_t<i_t, f_t>) is passing
settings_.shared_sb_solved which is a cuda::std::span<std::atomic<int>> into a
constructor expecting std::span<std::atomic<int>>; explicitly construct a
std::span from the underlying data and size (use
settings_.shared_sb_solved.data() and settings_.shared_sb_solved.size()) so the
types match instead of relying on an implicit conversion between cuda::std::span
and std::span.
In `@cpp/src/pdlp/solve.cu`:
- Around line 897-903: The batch memory estimator always accounts for three full
output buffers but the actual allocations for the third (solution collection)
buffer are skipped when generate_batch_primal_dual_solution is false; update the
total_memory calculation in solve.cu so it conditionally charges the third
output buffer only when generate_batch_primal_dual_solution is true (adjust the
same logic around the corresponding block later in the function where
allocations are made; see references to total_memory, trial_batch_size,
problem.get_n_variables(), problem.get_n_constraints(), and
generate_batch_primal_dual_solution and apply the same fix for the duplicate
estimator at the other occurrence mentioned in the comment).
In `@cpp/src/pdlp/solver_settings.cu`:
- Around line 62-88: The bulk warm-start path in set_pdlp_warm_start_data is
writing initial_primal_weight and initial_step_size directly and bypasses the
numeric guards; update set_pdlp_warm_start_data to validate those values the
same way as set_initial_primal_weight and set_initial_step_size (either by
calling those two setters from set_pdlp_warm_start_data or by applying the same
cuopt_expects checks: >0, !isinf, !isnan) so the warm-start overload enforces
the same invariants before assigning initial_primal_weight_ and
initial_step_size_.
In `@cpp/src/pdlp/translate.hpp`:
- Line 12: Remove the unused CUDA device header include from translate.hpp:
delete the line including <mip_heuristics/problem/problem.cuh> because
translate.hpp only references detail::problem_t from
optimization_problem_interface.hpp and does not use any symbols from
problem.cuh; after removal, confirm translate.hpp still compiles standalone and
that no other code relies on that include being present.
---
Outside diff comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 547-584: The code removes columns via
A_no_slacks.remove_columns(cols_to_remove) but still blindly slices the first n
entries of root_soln, lp.objective, lp.lower, lp.upper into PDLP
(original_root_soln_x and mps_model.set_* calls), which is only correct if all
slack indices in new_slacks are appended (>= n); add an explicit invariant check
or build a proper old→new column map: either assert all new_slacks[j] >= n
before proceeding (and document the invariant), or construct a mapping from
original column index → compacted index after remove_columns and use it to remap
original_root_soln_x, the objective vector, lower/upper bound arrays, and any
branching candidate indices so the CSR matrix and these vectors stay aligned
(refer to cols_to_remove, A_no_slacks, original_root_soln_x,
mps_model.set_objective_coefficients, set_variable_lower_bounds,
set_variable_upper_bounds).
In `@cpp/src/pdlp/pdlp.cu`:
- Around line 801-864: This early-return path fails to notify the
solver-bookkeeping that the remaining resized-batch climbers are finished;
before returning, iterate over
current_termination_strategy_.get_terminations_status().size() and call
sb_view_.mark_solved(...) for each climber using
climber_strategies_[i].original_index (and the appropriate termination/solution
metadata you have just copied into batch_solution_to_return_), mirroring the
behavior used at the later return (line 865) so in-progress Dual Simplex tasks
can stop/skip those chunks.
---
Duplicate comments:
In `@cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp`:
- Around line 303-305: Defaulting generate_batch_primal_dual_solution to false
makes batch solves opt‑in and breaks callers that expect populated
primal/dual/reduced-cost buffers; change the bool
generate_batch_primal_dual_solution default to true (so batch solution
collection is enabled by default/opt‑out) and update the adjacent comment to
reflect "opt-out" semantics so existing tests and code that slice these buffers
continue to work without modification.
In `@cpp/src/branch_and_bound/pseudo_costs.hpp`:
- Around line 416-425: The batch_pdlp_warm_cache_t struct can retain stale
warm-start data after the LP context changes because there is no reset path; add
a member method reset() on batch_pdlp_warm_cache_t that clears initial_primal
and initial_dual (release/resize to 0 on batch_pdlp_handle.get_stream()), sets
step_size and primal_weight back to NaN, pdlp_iteration to -1,
percent_solved_by_batch_pdlp_at_root to f_t(0.0) and populated to false, and
then call this reset() from the pseudo_costs_t code path that
reinitializes/changes the LP context (where the context-reset currently occurs)
so any time cuts/slacks/original-LP are changed the warm cache is invalidated.
In `@cpp/src/branch_and_bound/shared_strong_branching_context.hpp`:
- Around line 38-56: The accessors (is_solved, mark_solved, subview) use asserts
and can overflow in release builds; replace the assert-based checks with runtime
bounds validation and throw std::out_of_range on failure, and avoid signed
arithmetic overflow in subview by converting indices to size_t first.
Concretely: in is_solved(i_t local_idx) and mark_solved(i_t local_idx) cast
local_idx to size_t (e.g. size_t idx = static_cast<size_t>(local_idx)), verify
local_idx >= 0 and idx < solved.size(), and throw std::out_of_range with a clear
message instead of asserting; in subview(i_t offset, i_t count) cast both to
size_t (size_t off = static_cast<size_t>(offset), cnt =
static_cast<size_t>(count)), ensure offset and count are non-negative, check off
<= solved.size() and cnt <= solved.size() - off, then return
{solved.subspan(off, cnt)} to avoid offset+count overflow.
In `@cpp/src/pdlp/solve.cu`:
- Around line 1049-1057: The code only checks settings.shared_sb_solved.empty()
but not its length relative to new_bounds, so subspan(i, current_batch_size) can
out-of-bounds; before the batching loop validate that when
settings.shared_sb_solved is non-empty its size() (or extent()) is >=
max_batch_size (or original_new_bounds.size()), and if not, either trim
max_batch_size/current_batch_size accordingly or return/error; then inside the
loop you can safely call settings.shared_sb_solved.subspan(i,
current_batch_size) to populate batch_settings.shared_sb_solved. Ensure you
reference settings.shared_sb_solved, original_new_bounds/new_bounds,
max_batch_size and subspan in the check and handle the shorter span case
consistently.
- Around line 968-972: The current logic lets settings.sub_batch_size override
the memory-checked limit, which can reintroduce OOMs; instead, when computing
optimal_batch_size in solve.cu (symbols: optimal_batch_size,
settings.sub_batch_size, memory_max_batch_size,
detail::optimal_batch_size_handler, max_batch_size, cuopt_assert), clamp
settings.sub_batch_size to the memory ceiling (e.g., use
min(settings.sub_batch_size, memory_max_batch_size) or validate and reject
values > memory_max_batch_size) so the final optimal_batch_size never exceeds
memory_max_batch_size (and still respects max_batch_size), then keep the
cuopt_assert as a final sanity check.
In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Around line 2243-2254: The fixed sleep is flaky; replace it with a
deterministic synchronization (e.g., std::promise/std::future or std::latch) so
the main thread waits until the solver thread actually enters the cooperative DS
path. Create a promise/future (or latch), pass the signal into the pdlp_thread
lambda (or have solve_lp expose a hook) and have the solver thread set the
promise when the cooperative path is live; then have the main thread wait on the
future before calling sb_view.mark_solved(i). Apply the same change to the other
occurrence around the second pdlp_thread usage.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 982cb111-beb8-4701-9cff-5285678a13b5
📒 Files selected for processing (23)
cpp/include/cuopt/linear_programming/constants.hcpp/include/cuopt/linear_programming/mip/solver_settings.hppcpp/include/cuopt/linear_programming/pdlp/solver_settings.hppcpp/src/branch_and_bound/branch_and_bound.cppcpp/src/branch_and_bound/pseudo_costs.cppcpp/src/branch_and_bound/pseudo_costs.hppcpp/src/branch_and_bound/shared_strong_branching_context.hppcpp/src/dual_simplex/simplex_solver_settings.hppcpp/src/math_optimization/solver_settings.cucpp/src/mip_heuristics/CMakeLists.txtcpp/src/mip_heuristics/solver.cucpp/src/pdlp/pdlp.cucpp/src/pdlp/pdlp.cuhcpp/src/pdlp/pdlp_constants.hppcpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cucpp/src/pdlp/solve.cucpp/src/pdlp/solver_settings.cucpp/src/pdlp/termination_strategy/termination_strategy.cucpp/src/pdlp/termination_strategy/termination_strategy.hppcpp/src/pdlp/translate.hppcpp/src/pdlp/utilities/ping_pong_graph.cucpp/tests/linear_programming/pdlp_test.cupython/cuopt_server/cuopt_server/utils/linear_programming/data_definition.py
💤 Files with no reviewable changes (1)
- cpp/src/pdlp/pdlp_constants.hpp
| omp_atomic_t<i_t> num_strong_branches_completed = 0; | ||
| omp_atomic_t<int64_t> strong_branching_lp_iter = 0; | ||
|
|
||
| batch_pdlp_warm_cache_t<i_t, f_t> pdlp_warm_cache; |
There was a problem hiding this comment.
Don't store the warm-start cache as unsynchronized shared worker state.
pseudo_costs_t already uses omp_atomic_t and omp_mutex_t for data that workers update concurrently, but pdlp_warm_cache is a plain mutable member. If two workers hit reliability/strong branching at once, they can race on the cached device buffers, scalars, and populated, cross-contaminating warm starts between nodes. Make the cache worker-local or guard it with a dedicated mutex.
As per coding guidelines "Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/branch_and_bound/pseudo_costs.hpp` at line 541, The pdlp_warm_cache
member on pseudo_costs_t is mutable shared state and can race when multiple
workers perform reliability/strong branching; make the cache worker-local or add
synchronization: either move the batch_pdlp_warm_cache_t<i_t, f_t>
pdlp_warm_cache out of the shared pseudo_costs_t (create a stack/local instance
inside the branching worker routine) or add a dedicated omp_mutex_t (e.g.,
pdlp_warm_cache_mutex) and wrap every access/update to pdlp_warm_cache in a
lock/unlock pair (similar to existing omp_atomic_t/omp_mutex_t usage) to prevent
cross-contamination of device buffers, scalars, and populated state. Ensure all
functions that read or write pdlp_warm_cache (reliability/strong branching
paths) use the new mutex or the local per-worker instance.
| void pdlp_solver_settings_t<i_t, f_t>::set_initial_step_size(f_t initial_step_size) | ||
| { | ||
| cuopt_expects(initial_step_size > f_t(0), | ||
| error_type_t::ValidationError, | ||
| "Initial step size must be greater than 0"); | ||
| cuopt_expects(!std::isinf(initial_step_size), | ||
| error_type_t::ValidationError, | ||
| "Initial step size must be finite"); | ||
| cuopt_expects(!std::isnan(initial_step_size), | ||
| error_type_t::ValidationError, | ||
| "Initial step size must be a number"); | ||
| initial_step_size_ = std::make_optional(initial_step_size); | ||
| } | ||
|
|
||
| template <typename i_t, typename f_t> | ||
| void pdlp_solver_settings_t<i_t, f_t>::set_initial_primal_weight(f_t initial_primal_weight) | ||
| { | ||
| cuopt_expects(initial_primal_weight > f_t(0), | ||
| error_type_t::ValidationError, | ||
| "Initial primal weight must be greater than 0"); | ||
| cuopt_expects(!std::isinf(initial_primal_weight), | ||
| error_type_t::ValidationError, | ||
| "Initial primal weight must be finite"); | ||
| cuopt_expects(!std::isnan(initial_primal_weight), | ||
| error_type_t::ValidationError, | ||
| "Initial primal weight must be a number"); | ||
| initial_primal_weight_ = std::make_optional(initial_primal_weight); |
There was a problem hiding this comment.
The bulk warm-start path still bypasses these new numeric guards.
set_pdlp_warm_start_data(...) later in this file writes the warm-start initial_primal_weight and initial_step_size directly at Lines 316-317, so callers using that overload can still inject <=0, inf, or nan values. Please reuse the same validation there so both entry points enforce the same solver-state invariant. As per coding guidelines: "Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)".
🔧 Suggested direction
cuopt_expects(last_restart_duality_gap_dual_solution != nullptr,
error_type_t::ValidationError,
"last_restart_duality_gap_dual_solution cannot be null");
+ cuopt_expects(initial_primal_weight > f_t(0) &&
+ !std::isinf(initial_primal_weight) &&
+ !std::isnan(initial_primal_weight),
+ error_type_t::ValidationError,
+ "Initial primal weight must be finite and greater than 0");
+ cuopt_expects(initial_step_size > f_t(0) &&
+ !std::isinf(initial_step_size) &&
+ !std::isnan(initial_step_size),
+ error_type_t::ValidationError,
+ "Initial step size must be finite and greater than 0");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/pdlp/solver_settings.cu` around lines 62 - 88, The bulk warm-start
path in set_pdlp_warm_start_data is writing initial_primal_weight and
initial_step_size directly and bypasses the numeric guards; update
set_pdlp_warm_start_data to validate those values the same way as
set_initial_primal_weight and set_initial_step_size (either by calling those two
setters from set_pdlp_warm_start_data or by applying the same cuopt_expects
checks: >0, !isinf, !isnan) so the warm-start overload enforces the same
invariants before assigning initial_primal_weight_ and initial_step_size_.
|
/ok to test bb50ad1 |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/pdlp/pdlp.cu`:
- Around line 780-798: Add a single upfront cuopt_assert before using
climber_strategies_[i].original_index to ensure the shared strong-branching span
matches the original batch width: when sb_view_.is_valid() is true, assert that
sb_view_ (or shared_sb_solved/span) size/width equals the original batch width
used to generate climber_strategies_ (so original_index is in-bounds for
shared_strong_branching_context_view_t). Place this check immediately inside the
if (sb_view_.is_valid()) block guarding the loop that accesses
climber_strategies_ (and add the same guard at the other occurrences noted
around the blocks using original_index at the locations corresponding to lines
~849-850, ~866-869, and ~928-929).
- Around line 789-790: When you set a climber to
pdlp_termination_status_t::ConcurrentLimit via
current_termination_strategy_.set_termination_status, do not serialize PDLP
iterates into that climber's result: either propagate the external solver's
result into the per-climber payload or skip filling primal/dual/reduced-cost
buffers, step metadata and the solved_by field for that climber. Locate the
blocks that read from pdhg_solver_/PDLP state and populate per-climber solution
data (the occurrences around the current call and the other ranges noted:
~818-849, 872-878, 897-928) and add a guard that if status ==
pdlp_termination_status_t::ConcurrentLimit then avoid populating the iterate
buffers and step metadata (or instead copy the externally-provided result
structure) and ensure solved_by is set consistently from the external-solver
branch.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 17941937-9cfb-4973-a40a-f3ce01b5a538
📒 Files selected for processing (1)
cpp/src/pdlp/pdlp.cu
| if (sb_view_.is_valid()) { | ||
| for (size_t i = 0; i < climber_strategies_.size(); ++i) { | ||
| // If PDLP has solved it to optimality we want to keep it and resolved both solvers having | ||
| // solved the problem later | ||
| if (current_termination_strategy_.is_done( | ||
| current_termination_strategy_.get_termination_status(i))) | ||
| continue; | ||
| const i_t local_idx = climber_strategies_[i].original_index; | ||
| if (sb_view_.is_solved(local_idx)) { | ||
| current_termination_strategy_.set_termination_status( | ||
| i, pdlp_termination_status_t::ConcurrentLimit); | ||
| #ifdef BATCH_VERBOSE_MODE | ||
| std::cout << "[COOP SB] DS already solved climber " << i << " (original_index " << local_idx | ||
| << "), synced to ConcurrentLimit at step " << internal_solver_iterations_ | ||
| << std::endl; | ||
| #endif | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Validate shared_sb_solved against the original batch size before using original_index.
sb_view_.is_valid() only proves the span is non-empty. These loads/stores use climber_strategies_[i].original_index, which stays in the original batch numbering after compaction. If the caller passes a shorter span, the assert inside shared_strong_branching_context_view_t disappears in release builds and this becomes an out-of-bounds atomic access. Please add a single upfront cuopt_assert that the shared span matches the original batch width.
Suggested guard
+if (sb_view_.is_valid()) {
+ cuopt_assert(sb_view_.solved.size() == settings_.new_bounds.size(),
+ "shared_sb_solved must match the original batch size");
+}
+
if (sb_view_.is_valid()) {
for (size_t i = 0; i < climber_strategies_.size(); ++i) {As per coding guidelines, "Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations".
Also applies to: 849-850, 866-869, 928-929
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/pdlp/pdlp.cu` around lines 780 - 798, Add a single upfront
cuopt_assert before using climber_strategies_[i].original_index to ensure the
shared strong-branching span matches the original batch width: when
sb_view_.is_valid() is true, assert that sb_view_ (or shared_sb_solved/span)
size/width equals the original batch width used to generate climber_strategies_
(so original_index is in-bounds for shared_strong_branching_context_view_t).
Place this check immediately inside the if (sb_view_.is_valid()) block guarding
the loop that accesses climber_strategies_ (and add the same guard at the other
occurrences noted around the blocks using original_index at the locations
corresponding to lines ~849-850, ~866-869, and ~928-929).
| current_termination_strategy_.set_termination_status( | ||
| i, pdlp_termination_status_t::ConcurrentLimit); |
There was a problem hiding this comment.
Don't serialize PDLP iterates as the result for ConcurrentLimit climbers.
Once a climber is flipped to pdlp_termination_status_t::ConcurrentLimit, this method still fills its primal/dual/reduced-cost buffers and step metadata from pdhg_solver_ / PDLP state. That means the status says “externally finished”, but the payload is just PDLP’s last iterate, and solved_by is only written on the PDLP branch. Please either plumb the external solver result through here or skip populating per-climber solution payloads for ConcurrentLimit.
As per coding guidelines, "Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results".
Also applies to: 818-849, 872-878, 897-928
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/pdlp/pdlp.cu` around lines 789 - 790, When you set a climber to
pdlp_termination_status_t::ConcurrentLimit via
current_termination_strategy_.set_termination_status, do not serialize PDLP
iterates into that climber's result: either propagate the external solver's
result into the per-climber payload or skip filling primal/dual/reduced-cost
buffers, step metadata and the solved_by field for that climber. Locate the
blocks that read from pdhg_solver_/PDLP state and populate per-climber solution
data (the occurrences around the current call and the other ranges noted:
~818-849, 872-878, 897-928) and add a guard that if status ==
pdlp_termination_status_t::ConcurrentLimit then avoid populating the iterate
buffers and step metadata (or instead copy the externally-provided result
structure) and ensure solved_by is set consistently from the external-solver
branch.
…conditions are not met
|
/ok to test 0a8b10a |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
cpp/src/branch_and_bound/pseudo_costs.cpp (4)
1617-1624: Minor: Guard against division by near-zero in pseudo-cost update.If
leaf_solution.x[j]becomes very close to an integer due to numerical effects,change_in_xcould be extremely small, producing very large pseudo-cost values that could distort branching decisions.Optional defensive guard
+ constexpr f_t min_frac = 1e-8; f_t change_in_obj = std::max(obj - node_ptr->lower_bound, eps); f_t change_in_x = leaf_solution.x[j] - std::floor(leaf_solution.x[j]); - pseudo_cost_sum_down[j] += change_in_obj / change_in_x; + if (change_in_x > min_frac) { + pseudo_cost_sum_down[j] += change_in_obj / change_in_x; + } pseudo_cost_num_down[j]++;As per coding guidelines: "Check numerical stability... division by zero/near-zero".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 1617 - 1624, Guard the pseudo-cost update against division by near-zero: compute change_in_x = leaf_solution.x[j] - std::floor(leaf_solution.x[j]) and if fabs(change_in_x) is below a small threshold (e.g. 1e-12) either skip the update or clamp change_in_x to that threshold before doing pseudo_cost_sum_down[j] += (change_in_obj / change_in_x) and incrementing pseudo_cost_num_down[j]; keep the existing change_in_obj calculation (obj - node_ptr->lower_bound, eps) and leave the rb_mode/is_dual_simplex_done / sb_view.mark_solved logic unchanged.
735-737: Redundant check: assertion makes the if-condition always true.The assertion at line 735 guarantees
!populatedis true, making the if-check at line 737 redundant in debug builds. In release builds (where assert is stripped), ifpopulatedis unexpectedly true, the block would repopulate the cache which might cause issues.Consider one of these approaches:
- Remove the if-check if the assertion is the invariant you want to enforce
- Remove the assertion and keep only the if-check with early return if already populated
- If both paths should be valid, add an else branch to handle the already-populated case
Option: Keep assertion, remove redundant if
assert(!pc.pdlp_warm_cache.populated && "PDLP warm cache should not be populated at this point"); - if (!pc.pdlp_warm_cache.populated) { pdlp_solver_settings_t<i_t, f_t> ws_settings; // ... rest of the block - }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 735 - 737, The assert on pc.pdlp_warm_cache.populated makes the subsequent if-check redundant in debug builds but unsafe in release builds; remove the assert line (the assertion referencing pc.pdlp_warm_cache.populated) and keep the existing if (!pc.pdlp_warm_cache.populated) block, adding an early return or skipping logic when populated is true so the code behaves correctly in both debug and release builds and avoids double-populating the PDLP warm cache.
614-628: Consider using a tolerance check for sigma instead of exact equality.While
sigmais structurally expected to be exactly ±1, the direct floating-point equality comparison violates the coding guideline about using epsilon comparisons. If matrix values are ever scaled or perturbed during preprocessing, this assertion could unexpectedly fail.Suggested defensive check
- if (sigma == -1) { + constexpr f_t sigma_tol = 1e-9; + if (std::abs(sigma - (-1.0)) < sigma_tol) { constraint_lower[i] = slack_lower + lp.rhs[i]; constraint_upper[i] = slack_upper + lp.rhs[i]; - } else if (sigma == 1) { + } else if (std::abs(sigma - 1.0) < sigma_tol) { constraint_lower[i] = -slack_upper + lp.rhs[i]; constraint_upper[i] = -slack_lower + lp.rhs[i]; } else { - assert(sigma == 1.0 || sigma == -1.0); + assert(false && "sigma must be +1 or -1"); }As per coding guidelines: "use epsilon comparisons for floating-point equality checks".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 614 - 628, The code in pseudo_costs.cpp compares the floating value sigma to -1 and 1 using exact equality; change this to an epsilon-based check (e.g., use fabs(sigma - 1.0) < EPS and fabs(sigma + 1.0) < EPS with a small EPS like 1e-9 or a named constant) in the branch that assigns constraint_lower/constraint_upper, and replace the final assert(sigma == 1.0 || sigma == -1.0) with an assertion that one of the epsilon checks passed (or handle the unexpected value by logging/error). Update occurrences referencing sigma, slack_map, lp.A.x, constraint_lower and constraint_upper accordingly.
1189-1212: Clarify merge behavior when source is NONE in mode 2.At line 1192:
if (source == sb_source_t::PDLP || effective_batch_pdlp == 2) { sb_dest = value; }In
effective_batch_pdlp == 2mode, this writesvalueunconditionally. Whensource == sb_source_t::NONE,valueis NaN. Consider adding explicit handling:Suggested clarification
- if (source == sb_source_t::PDLP || effective_batch_pdlp == 2) { sb_dest = value; } + // In mode 2 (PDLP-only), always use PDLP result. In mode 1, use PDLP only when it wins. + // Note: if both failed, value is NaN which preserves the "unknown" state. + if (source == sb_source_t::PDLP || + (effective_batch_pdlp == 2 && source != sb_source_t::NONE)) { + sb_dest = value; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 1189 - 1212, The code unconditionally assigns sb_dest = value when effective_batch_pdlp == 2 even if merge_sb_result returned source == sb_source_t::NONE (value is NaN); change the assignment to only set sb_dest when the source is valid (e.g. source != sb_source_t::NONE) or when value is finite (use !std::isnan(value) / std::isfinite), and ensure the merged_* counters and verbose logging (merged_nan / merged_from_pdlp / merged_from_ds and the "[COOP SB] Merge" message) handle the NONE case consistently; update the conditional around sb_dest assignment (the line involving effective_batch_pdlp and sb_source_t::PDLP) and any subsequent branches that assume a valid value from merge_sb_result.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 1044-1054: When simplex_iteration_limit < 1 the function calls
initialize_pseudo_costs_with_estimate(...) but then continues into the merge
logic which can overwrite valid estimates with NaN (pdlp_obj_down/up and
dual_simplex_status_* are unset); add an early return immediately after
initialize_pseudo_costs_with_estimate(...) to skip the subsequent
merge_sb_result/merge logic so pc.strong_branch_down and pc.strong_branch_up are
not overwritten (this prevents the branch that checks sb_source_t::PDLP or
effective_batch_pdlp == 2 from writing NaN back into the pseudo-costs).
- Around line 778-785: The async device copies into cache.initial_primal and
cache.initial_dual use ws_primal.stream()/ws_dual.stream() which may outlive
ws_solution and cause a use-after-free; fix by performing the copy on a stream
owned by the cache or by synchronizing the source stream before destroying
ws_solution—e.g., construct rmm::device_uvector using cache's stream (or call
cudaStreamSynchronize/ws_primal.stream() after the copy) so that the async
memcpy initiated by rmm::device_uvector (used in the pseudo_costs.cpp block
populating cache.initial_primal/initial_dual and setting cache.populated) cannot
access freed source memory.
- Around line 1543-1560: The OpenMP task/taskloop pragmas in
reliable_variable_selection are being emitted even when not inside a parallel
region; wrap the omp task and omp taskloop usage with runtime guards (e.g. check
omp_in_parallel()) or provide an alternative sequential path so the code behaves
correctly in single-threaded runs. Specifically, before emitting the "#pragma
omp task" that calls batch_pdlp_reliability_branching_task and before the
"#pragma omp taskloop" block in reliable_variable_selection, detect
omp_in_parallel() and only use the task/taskloop when true; otherwise call
batch_pdlp_reliability_branching_task (and the taskloop body) directly in a loop
on the calling thread to preserve semantics. Ensure checks are adjacent to the
existing pragmas and reference reliable_variable_selection,
batch_pdlp_reliability_branching_task, and the taskloop block so the change is
obvious and localized.
---
Nitpick comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 1617-1624: Guard the pseudo-cost update against division by
near-zero: compute change_in_x = leaf_solution.x[j] -
std::floor(leaf_solution.x[j]) and if fabs(change_in_x) is below a small
threshold (e.g. 1e-12) either skip the update or clamp change_in_x to that
threshold before doing pseudo_cost_sum_down[j] += (change_in_obj / change_in_x)
and incrementing pseudo_cost_num_down[j]; keep the existing change_in_obj
calculation (obj - node_ptr->lower_bound, eps) and leave the
rb_mode/is_dual_simplex_done / sb_view.mark_solved logic unchanged.
- Around line 735-737: The assert on pc.pdlp_warm_cache.populated makes the
subsequent if-check redundant in debug builds but unsafe in release builds;
remove the assert line (the assertion referencing pc.pdlp_warm_cache.populated)
and keep the existing if (!pc.pdlp_warm_cache.populated) block, adding an early
return or skipping logic when populated is true so the code behaves correctly in
both debug and release builds and avoids double-populating the PDLP warm cache.
- Around line 614-628: The code in pseudo_costs.cpp compares the floating value
sigma to -1 and 1 using exact equality; change this to an epsilon-based check
(e.g., use fabs(sigma - 1.0) < EPS and fabs(sigma + 1.0) < EPS with a small EPS
like 1e-9 or a named constant) in the branch that assigns
constraint_lower/constraint_upper, and replace the final assert(sigma == 1.0 ||
sigma == -1.0) with an assertion that one of the epsilon checks passed (or
handle the unexpected value by logging/error). Update occurrences referencing
sigma, slack_map, lp.A.x, constraint_lower and constraint_upper accordingly.
- Around line 1189-1212: The code unconditionally assigns sb_dest = value when
effective_batch_pdlp == 2 even if merge_sb_result returned source ==
sb_source_t::NONE (value is NaN); change the assignment to only set sb_dest when
the source is valid (e.g. source != sb_source_t::NONE) or when value is finite
(use !std::isnan(value) / std::isfinite), and ensure the merged_* counters and
verbose logging (merged_nan / merged_from_pdlp / merged_from_ds and the "[COOP
SB] Merge" message) handle the NONE case consistently; update the conditional
around sb_dest assignment (the line involving effective_batch_pdlp and
sb_source_t::PDLP) and any subsequent branches that assume a valid value from
merge_sb_result.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e6684700-8d0e-4e59-ac7e-f0676d2019c2
📒 Files selected for processing (3)
cpp/src/branch_and_bound/pseudo_costs.cppcpp/src/dual_simplex/simplex_solver_settings.hppcpp/src/pdlp/solve.cu
🚧 Files skipped from review as they are similar to previous changes (2)
- cpp/src/dual_simplex/simplex_solver_settings.hpp
- cpp/src/pdlp/solve.cu
| // Need to use the pc steam since the batch pdlp handle will get destroyed after the warm | ||
| // start | ||
| cache.initial_primal = rmm::device_uvector<f_t>(ws_primal, ws_primal.stream()); | ||
| cache.initial_dual = rmm::device_uvector<f_t>(ws_dual, ws_dual.stream()); | ||
| cache.step_size = ws_solution.get_pdlp_warm_start_data().initial_step_size_; | ||
| cache.primal_weight = ws_solution.get_pdlp_warm_start_data().initial_primal_weight_; | ||
| cache.pdlp_iteration = ws_solution.get_pdlp_warm_start_data().total_pdlp_iterations_; | ||
| cache.populated = true; |
There was a problem hiding this comment.
Potential use-after-free: async device copy may not complete before source is destroyed.
The comment mentions needing to use a specific stream, but the code uses ws_primal.stream() and ws_dual.stream() from ws_solution. The rmm::device_uvector copy constructor performs an async memcpy on the given stream. If ws_solution (and its underlying device memory) is destroyed before the async copy completes, this could cause undefined behavior.
Suggested fix: use the cache's stream or synchronize
// Need to use the pc stream since the batch pdlp handle will get destroyed after the warm
// start
- cache.initial_primal = rmm::device_uvector<f_t>(ws_primal, ws_primal.stream());
- cache.initial_dual = rmm::device_uvector<f_t>(ws_dual, ws_dual.stream());
+ auto cache_stream = cache.batch_pdlp_handle.get_stream();
+ cache.initial_primal = rmm::device_uvector<f_t>(ws_primal, cache_stream);
+ cache.initial_dual = rmm::device_uvector<f_t>(ws_dual, cache_stream);
+ // Ensure copy completes before ws_solution goes out of scope
+ cudaStreamSynchronize(cache_stream);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 778 - 785, The async
device copies into cache.initial_primal and cache.initial_dual use
ws_primal.stream()/ws_dual.stream() which may outlive ws_solution and cause a
use-after-free; fix by performing the copy on a stream owned by the cache or by
synchronizing the source stream before destroying ws_solution—e.g., construct
rmm::device_uvector using cache's stream (or call
cudaStreamSynchronize/ws_primal.stream() after the copy) so that the async
memcpy initiated by rmm::device_uvector (used in the pseudo_costs.cpp block
populating cache.initial_primal/initial_dual and setting cache.populated) cannot
access freed source memory.
| if (simplex_iteration_limit < 1) { | ||
| initialize_pseudo_costs_with_estimate(original_lp, | ||
| settings, | ||
| root_vstatus, | ||
| root_solution, | ||
| basic_list, | ||
| nonbasic_list, | ||
| fractional, | ||
| basis_factors, | ||
| pc); | ||
| } else { |
There was a problem hiding this comment.
Return after the estimate-only fallback to prevent NaN overwriting valid estimates.
When simplex_iteration_limit < 1, the function populates pseudo-costs with estimates via initialize_pseudo_costs_with_estimate(). However, execution continues to the merge logic at lines 1173+. Since neither DS nor PDLP actually ran in this path, pdlp_obj_down/up remain NaN and dual_simplex_status_* remain UNSET.
At line 1192, when effective_batch_pdlp == 2:
if (source == sb_source_t::PDLP || effective_batch_pdlp == 2) { sb_dest = value; }This unconditionally writes value (which is NaN from merge_sb_result returning {NaN, NONE}) back to pc.strong_branch_down/up, overwriting the valid estimates.
Suggested fix: add early return after estimate computation
if (simplex_iteration_limit < 1) {
initialize_pseudo_costs_with_estimate(original_lp,
settings,
root_vstatus,
root_solution,
basic_list,
nonbasic_list,
fractional,
basis_factors,
pc);
+ pc.update_pseudo_costs_from_strong_branching(fractional, root_solution.x);
+ settings.log.printf("Strong branching completed (estimate-only) in %.2fs\n",
+ toc(strong_branching_start_time));
+ return;
} else {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 1044 - 1054, When
simplex_iteration_limit < 1 the function calls
initialize_pseudo_costs_with_estimate(...) but then continues into the merge
logic which can overwrite valid estimates with NaN (pdlp_obj_down/up and
dual_simplex_status_* are unset); add an early return immediately after
initialize_pseudo_costs_with_estimate(...) to skip the subsequent
merge_sb_result/merge logic so pc.strong_branch_down and pc.strong_branch_up are
not overwritten (this prevents the branch that checks sb_source_t::PDLP or
effective_batch_pdlp == 2 from writing NaN back into the pseudo-costs).
| if (use_pdlp) { | ||
| #pragma omp task default(shared) | ||
| batch_pdlp_reliability_branching_task(log, | ||
| rb_mode, | ||
| num_candidates, | ||
| start_time, | ||
| concurrent_halt, | ||
| original_lp, | ||
| new_slacks, | ||
| leaf_solution.x, | ||
| worker, | ||
| candidate_vars, | ||
| settings, | ||
| sb_view, | ||
| pdlp_warm_cache, | ||
| pdlp_obj_down, | ||
| pdlp_obj_up); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Find call sites of reliable_variable_selection to verify they are within parallel regions
# Search for calls to reliable_variable_selection
rg -nC5 'reliable_variable_selection\s*\(' --type=cpp -g '!*pseudo_costs.cpp'Repository: NVIDIA/cuopt
Length of output: 2273
🏁 Script executed:
#!/bin/bash
# Get broader context around line 841 in branch_and_bound.cpp
sed -n '800,900p' cpp/src/branch_and_bound/branch_and_bound.cpp | cat -nRepository: NVIDIA/cuopt
Length of output: 5140
🏁 Script executed:
#!/bin/bash
# Search for calls to variable_selection method
rg -n '\.variable_selection\s*\(' cpp/src/branch_and_bound/branch_and_bound.cpp | head -20Repository: NVIDIA/cuopt
Length of output: 361
🏁 Script executed:
#!/bin/bash
# Get context around line 942 to identify the function and parallel regions
sed -n '920,960p' cpp/src/branch_and_bound/branch_and_bound.cpp | cat -nRepository: NVIDIA/cuopt
Length of output: 1885
🏁 Script executed:
#!/bin/bash
# Search for update_tree_impl to see where select_branch_variable is called
rg -n 'update_tree_impl\|select_branch_variable' cpp/src/branch_and_bound/branch_and_bound.cpp | head -30Repository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
#!/bin/bash
# Search for OpenMP pragmas in branch_and_bound.cpp to find parallel regions
rg -n '#pragma omp' cpp/src/branch_and_bound/branch_and_bound.cppRepository: NVIDIA/cuopt
Length of output: 276
🏁 Script executed:
#!/bin/bash
# Get broader context around the parallel regions to understand the overall structure
sed -n '2600,2650p' cpp/src/branch_and_bound/branch_and_bound.cpp | cat -nRepository: NVIDIA/cuopt
Length of output: 2355
🏁 Script executed:
#!/bin/bash
# Search for run_scheduler and where worker threads execute
rg -n 'run_scheduler\|worker.*loop\|process_node' cpp/src/branch_and_bound/branch_and_bound.cpp | head -20Repository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
#!/bin/bash
# Look at the reliable_variable_selection function and surrounding context to understand where it's called
sed -n '1520,1580p' cpp/src/branch_and_bound/pseudo_costs.cpp | cat -nRepository: NVIDIA/cuopt
Length of output: 3149
🏁 Script executed:
#!/bin/bash
# Search for all calls to pc_.reliable_variable_selection (the direct calls)
rg -n 'pc_\.reliable_variable_selection\|\.reliable_variable_selection' cpp/src/branch_and_bound/ --type=cppRepository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
#!/bin/bash
# Search for all references to reliable_variable_selection
rg -n 'reliable_variable_selection' cpp/src/branch_and_bound/ --type=cppRepository: NVIDIA/cuopt
Length of output: 392
🏁 Script executed:
#!/bin/bash
# Find and examine single_threaded_solve
rg -n 'single_threaded_solve' cpp/src/branch_and_bound/branch_and_bound.cpp -A 30 | head -60Repository: NVIDIA/cuopt
Length of output: 2958
🏁 Script executed:
#!/bin/bash
# Get the full single_threaded_solve function
sed -n '1797,1900p' cpp/src/branch_and_bound/branch_and_bound.cpp | cat -nRepository: NVIDIA/cuopt
Length of output: 5223
🏁 Script executed:
#!/bin/bash
# Search for plunge_with function
rg -n 'plunge_with' cpp/src/branch_and_bound/ --type=cpp -A 5 | head -40Repository: NVIDIA/cuopt
Length of output: 2157
🏁 Script executed:
#!/bin/bash
# Get the implementation of plunge_with
sed -n '1430,1550p' cpp/src/branch_and_bound/branch_and_bound.cpp | cat -nRepository: NVIDIA/cuopt
Length of output: 5434
🏁 Script executed:
#!/bin/bash
# Search for update_tree function and what it calls
rg -n 'update_tree\(' cpp/src/branch_and_bound/branch_and_bound.cpp | head -5Repository: NVIDIA/cuopt
Length of output: 331
🏁 Script executed:
#!/bin/bash
# Get the update_tree function implementation
sed -n '1282,1400p' cpp/src/branch_and_bound/branch_and_bound.cpp | cat -nRepository: NVIDIA/cuopt
Length of output: 5990
Fix OpenMP task directives in reliable_variable_selection to work in both parallel and non-parallel contexts.
The #pragma omp task and #pragma omp taskloop at lines 1544 and 1578+ require an active parallel region to execute as intended. However, in single-threaded mode, single_threaded_solve() (line 2622 in branch_and_bound.cpp) calls plunge_with() → update_tree() → variable_selection() → reliable_variable_selection() outside any parallel region, causing the task/taskloop pragmas to either execute sequentially on the calling thread or behave incorrectly.
Either guard these pragmas with if (omp_in_parallel()) checks, restructure the call site to ensure it always runs within a parallel region, or conditionally compile different code paths for single vs. multi-threaded execution.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 1543 - 1560, The
OpenMP task/taskloop pragmas in reliable_variable_selection are being emitted
even when not inside a parallel region; wrap the omp task and omp taskloop usage
with runtime guards (e.g. check omp_in_parallel()) or provide an alternative
sequential path so the code behaves correctly in single-threaded runs.
Specifically, before emitting the "#pragma omp task" that calls
batch_pdlp_reliability_branching_task and before the "#pragma omp taskloop"
block in reliable_variable_selection, detect omp_in_parallel() and only use the
task/taskloop when true; otherwise call batch_pdlp_reliability_branching_task
(and the taskloop body) directly in a loop on the calling thread to preserve
semantics. Ensure checks are adjacent to the existing pragmas and reference
reliable_variable_selection, batch_pdlp_reliability_branching_task, and the
taskloop block so the change is obvious and localized.
|
/merge |
This PR enables the following regarding batch PDLP:
Currently we still keep BPDLP off by default both at the root and in reliability branching